Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors
نویسندگان
چکیده
Conditional t-SNE (ct-SNE) is a recent extension to that allows removal of known cluster information from the embedding, obtain visualization revealing structure beyond label information. This useful, for example, when one wants factor out unwanted differences between set classes. We show ct-SNE fails in many realistic settings, namely if data well clustered over labels original high-dimensional space. introduce revised method by conditioning similarities instead low-dimensional and storing within- across-label nearest neighbors separately. also enables use recently proposed speedups t-SNE, improving scalability. From experiments on synthetic data, we find our resolves considered problems improves embedding quality. On real containing batch effects, expected improvement not always there. argue preferable overall, given its improved The results highlight new open questions, such as how handle distance variations clusters.
منابع مشابه
Naive Bayes Image Classification: Beyond Nearest Neighbors
Naive Bayes Nearest Neighbor (NBNN) has been proposed as a powerful, learning-free, non-parametric approach for object classification. Its good performance is mainly due to the avoidance of a vector quantization step, and the use of image-to-class comparisons, yielding good generalization. In this paper we study the replacement of the nearest neighbor part with more elaborate and robust (sparse...
متن کاملQuantifying long-range correlations in complex networks beyond nearest neighbors
We propose a fluctuation analysis to quantify spatial correlations in complex networks. The approach considers the sequences of degrees along shortest paths in the networks and quantifies the fluctuations in analogy to time series. In this work, the Barabasi-Albert (BA) model, the Cayley tree at the percolation transition, a fractal network model, and examples of real-world networks are studied...
متن کاملNearest-neighbors medians clustering
We propose a nonparametric cluster algorithm based on local medians. Each observation is substituted by its local median and this new observation moves toward the peaks and away from the valleys of the distribution. The process is repeated until each observation converges to a fixpoint. We obtain a partition of the sample based on the convergence points. Our algorithm determines the number of c...
متن کاملBoruvka Meets Nearest Neighbors
Computing the minimum spanning tree (MST) is a common task in the pattern recognition and the computer vision fields. However, little work has been done on efficient general methods for solving the problem on large datasets where graphs are complete and edge weights are given implicitly by a distance between vertex attributes. In this work we propose a generic algorithm that extends the classic...
متن کاملIterative Nearest Neighbors
Representing data as a linear combination of a set of selected known samples is of interest for various machine learning applications such as dimensionality reduction or classification. k-Nearest Neighbors (kNN) and its variants are still among the best-known and most often used techniques. Some popular richer representations are Sparse Representation (SR) based on solving an l1-regularized lea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-30047-9_14